Picture for Irfan Essa

Irfan Essa

HierSum: A Global and Local Attention Mechanism for Video Summarization

Add code
Apr 25, 2025
Viaarxiv icon

Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training

Add code
Feb 24, 2025
Viaarxiv icon

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

Add code
Feb 18, 2025
Viaarxiv icon

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Add code
Feb 04, 2025
Viaarxiv icon

Learning Complex Non-Rigid Image Edits from Multimodal Conditioning

Add code
Dec 13, 2024
Viaarxiv icon

AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset

Add code
Nov 23, 2024
Viaarxiv icon

Exploring Efficient Foundational Multi-modal Models for Video Summarization

Add code
Oct 09, 2024
Viaarxiv icon

Mamba Fusion: Learning Actions Through Questioning

Add code
Sep 17, 2024
Figure 1 for Mamba Fusion: Learning Actions Through Questioning
Figure 2 for Mamba Fusion: Learning Actions Through Questioning
Figure 3 for Mamba Fusion: Learning Actions Through Questioning
Figure 4 for Mamba Fusion: Learning Actions Through Questioning
Viaarxiv icon

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them

Add code
Aug 21, 2024
Viaarxiv icon

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Add code
Aug 14, 2024
Figure 1 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Figure 2 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Figure 3 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Figure 4 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Viaarxiv icon